Index Structures for Information Filtering Under the Vector Space Model
نویسندگان
چکیده
Under the Vector Space Model Tak W. Yan and Hector Garcia-Molina Department of Computer Science Stanford University Stanford, CA 94305 Abstract With the ever increasing volumes of electronic information generation, users of information systems are facing an information overload. It is desirable to support information ltering as a complement to traditional retrieval mechanism. The number of users, and thus pro les (representing users' long-term interests), handled by an information ltering system is potentially huge, and the system has to process a constant stream of incoming information in a timely fashion. The e ciency of the ltering process is thus an important issue. In this paper, we study what data structures and algorithms can be used to e ciently perform large-scale information ltering under the vector space model, a retrieval model established as being e ective. We apply the idea of the standard inverted index to index user pro les. We devise an alternative to the standard inverted index, in which we, instead of indexing every term in a pro le, select only the signi cant ones to index. We evaluate their performance and show that the indexing methods require orders of magnitude fewer I/Os to process a document than when no index is used. We also show that the proposed alternative performs better in terms of I/O and CPU processing time in many cases.
منابع مشابه
Improved Skips for Faster Postings List Intersection
Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...
متن کاملImproved Skips for Faster Postings List Intersection
Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...
متن کاملEmpirical Mode Decomposition based Adaptive Filtering for Orthogonal Frequency Division Multiplexing Channel Estimation
This paper presents an empirical mode decomposition (EMD) based adaptive filter (AF) for channel estimation in OFDM system. In this method, length of channel impulse response (CIR) is first approximated using Akaike information criterion (AIC). Then, CIR is estimated using adaptive filter with EMD decomposed IMF of the received OFDM symbol. The correlation and kurtosis measures are used to sel...
متن کاملA Stock Market Filtering Model Based on Minimum Spanning Tree in Financial Networks
There have been several efforts in the literature to extract as much information as possible from the financial networks. Most of the research has been concerned about the hierarchical structures, clustering, topology and also the behavior of the market network; but not a notable work on the network filtration exists. This paper proposes a stock market filtering model using the correlation - ba...
متن کاملA New Similarity Measure Based on Item Proximity and Closeness for Collaborative Filtering Recommendation
Recommender systems utilize information retrieval and machine learning techniques for filtering information and can predict whether a user would like an unseen item. User similarity measurement plays an important role in collaborative filtering based recommender systems. In order to improve accuracy of traditional user based collaborative filtering techniques under new user cold-start problem a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1994